Two-stage prosody prediction for emotional text-to-speech synthesis

نویسندگان

Hao Tang

Xi Zhou

Matthias Odisio

Mark Hasegawa-Johnson

Thomas S. Huang

چکیده

In this paper, we adopt a difference approach to prosody prediction for emotional text-to-speech synthesis, where the prosodic variations between emotional and neutral speech are decomposed into the global and local prosodic variations and predicted using a two-stage model. The global prosodic variations are modeled by the means and standard deviations of the prosodic parameters, while the local prosodic variations are modeled by the classification and regression tree (CART) and dynamic programming. The proposed two-stage prosody prediction model has been successfully implemented as a prosodic module in a Festival-MBROLA architecture based emotional text-to-speech synthesis system, which is able to synthesize highly intelligible, natural and expressive speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Experiments with signal-driven symbolic prosody for statistical parametric speech synthesis

This paper presents a preliminary study on the use of symbolic prosody extracted from the speech signal to improve parameters prediction on HMM-based speech synthesis. The relationship between the prosodic labelling and the actual prosody of the training data is usually ignored in the building phase of corpus based TTS voices. In this work, different systems have been trained using prosodic lab...

متن کامل

Rule-based Prosody Prediction for German Text-to-Speech Synthesis

This paper presents two empirical studies that examine the influence of different linguistic aspects on prosody in German. First, we analysed a German corpus with respect to the effect of syntax and information status on prosody. Second, we conducted a listening test which investigated the prosodic realisation of constituents in the German ’Vorfeld’ depending on their information status. The re...

متن کامل

Prosodic Fillers and Discourse Markers–Discourse Prosody and Text Prediction

Mandarin Chinese fluent speech prosody is characterized by a hierarchical multiple-phrase structure that specifies how speech paragraphs are constituted via Prosodic Phrase Grouping. Hence we view spoken discourse prosody as yet another higher node treats PGs (Prosodic Phrase Groups) as sister constituents. The goals of present study are two fold: one is to study how speech paragraphs are conne...

متن کامل

A corpus-based speech synthesis system with emotion

We propose a new approach to synthesizing emotional speech by a corpus-based concatenative speech synthesis system (ATR CHATR) using speech corpora of emotional speech. In this study, neither emotional-dependent prosody prediction nor signal processing per se is performed for emotional speech. Instead, a large speech corpus is created per emotion to synthesize speech with the appropriate emotio...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Two-stage prosody prediction for emotional text-to-speech synthesis

نویسندگان

چکیده

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

Experiments with signal-driven symbolic prosody for statistical parametric speech synthesis

Rule-based Prosody Prediction for German Text-to-Speech Synthesis

Prosodic Fillers and Discourse Markers–Discourse Prosody and Text Prediction

A corpus-based speech synthesis system with emotion

عنوان ژورنال:

اشتراک گذاری